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Abstract 

Non-standard distributional approximations have received considerable attention 
in recent years. They often provide more accurate approximations in small samples, 
and theoretical improvements in some cases. This paper shows that the seemingly 
unrelated “many instruments asymptotics” and “small bandwidth asymptotics” share 
a common structure, where the object determining the limiting distribution is a V- 
statistic with a remainder that is an asymptotically normal degenerate U-statistic. We 
illustrate how this general structure can be used to derive new results by obtaining a 
new asymptotic distribution of a series estimator of the partially linear model when 
the number of terms in the series approximation possibly grows as fast as the sample 
size, which we call “many terms asymptotics”. 
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1 Introduction 


Many instrument asymptotics, where the number of instruments grows as fast as the sam¬ 
ple size, has proven useful for instrumental variables (IV) estimators. Kunitomo (1980) 
and Morimune (1983) derived asymptotic variances that are larger than the usual formulae 
when the number of instruments and sample size grow at the same rate, and Bekker (1994) 
and others provided consistent estimators of these larger variances. Hansen, Hausman, and 
Newey (2008) showed that using many instrument standard errors provides a theoretical 
improvement for a range of number of instruments and a practical improvement for esti¬ 
mating the returns to schooling. Thus, many instrument asymptotics and the associated 
standard errors have been demonstrated to be a useful alternative to the usual asymptotics 
for instrumental variables. 

Instrumental variable estimators implicitly depend on a nonparametric series estimator. 
Many instrument asymptotics has the number of series terms growing so fast that the series 
estimator is not consistent. Analogous asymptotics for kernel-based density-weighted average 
derivative estimators has been considered by Cattaneo, Crump, and Jansson (2010, 2014b). 
They show that when the bandwidth shrinks faster than needed for consistency of the kernel 
estimator, the variance of the estimator is larger than the usual formula. They also hnd that 
correcting the variance provides an improvement over standard asymptotics for a range of 
bandwidths. 

The purpose of this paper is to show that these results share a common structure, and to 
illustrate how this structure can be used to derive new results. The common structure is that 
the object determining the limiting distribution is a V-statistic, which can be decomposed 
into a bias term, a sample average, and a “remainder” that is an asymptotically normal 
degenerate U-statistic. Asymptotic normality of the remainder distinguishes this setting 
from other ones involving V-statistics. Here the asymptotically normal remainder comes 
from the number of series terms going to inhnity or bandwidth shrinking to zero, while the 
behavior of a degenerate U-statistic tends to be more complicated in other settings. When 
the number of terms grows as fast as the sample size, or the bandwidth shrinks to zero at an 
appropriate rate, the remainder has the same magnitude as the leading term, resulting in an 
asymptotic variance larger than just the variance of the leading term. The many instrument 
and small bandwidth results share this structure. In keeping with this common structure, we 
will henceforth refer to such results under the general heading of “alternative asymptotics”. 

The alternative asymptotics that we discuss in this paper applies to statistics that take 
a specihc V-statistic representation, or may be approximated by it sufficiently accurately, 
and therefore it does not apply broadly to all possible semiparametric settings. Nonetheless, 
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as we illustrate below, this structure arises naturally in several interesting problems in Eco¬ 
nomics and Statistics. In particular, we show formally that applying this common structure 
to a series estimator of the partially linear model leads to new results. These results allow the 
number of terms in the series approximation to grow as fast as the sample size. The asymp¬ 
totic distribution of the estimator is derived and it is shown to have a larger asymptotic 
variance than the usual formula, which is in fact a natural and generic consequence of the 
specihc structure that we highlight in this paper. We also hnd that under homoskedasticity, 
the classical degrees of freedom adjusted homoskedastic standard error estimator from linear 
models is consistent even when the number of terms is “large” relative to the sample size. 
This result offers a large sample, distribution free justihcation for the degrees of freedom 
correction when many series terms are employed. Constructing automatic consistent stan¬ 
dard error estimator under (conditional) heteroskedasticity of unknown form in this setting 
turns out to be quite challenging. In Cattaneo, Jansson, and Newey (2015), we present 
a detailed discussion of heteroskedasticity-robust standard errors for general linear models 
with increasing dimension, which covers the partially linear model with many terms studied 
herein as a special case. 

The rest of the paper is organized as follows. Section 2 describes the common structure 
of many instrument and small bandwidth asymptotics, and also shows how the structure 
leads to new results for the partially linear model. Section 3 formalizes the new distribu¬ 
tional approximation for the partially linear model. Section 4 reports results from a small 
simulation study aimed to illustrate our results in small samples. Section 5 concludes. The 
appendix collects the proofs of our results. 

2 A Common Structure 

To describe the common structure of many instrument and small bandwidth asymptotics, 
let Wi ,..., Wn denote independent random vectors. We consider an estimator /3 of a generic 
parameter of interest /do G satisfying 

v^0-/3o) = r-^Sr,, ( 1 ) 

1<2 

where u^j(-) is a function that can depend on i, j, and n. We allow u to depend on n to 
account for number of terms or bandwidths that change with the sample size. Also, we allow 
u to vary with i and j to account for dependence on variables that are being conditioned on 
in the asymptotics, and so treated as nonrandom. 

We assume throughout this section that there exists a sequence of non-random matrices 
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r„ satisfying Id for Id the d x d identity matrix, and hence we focus on the 

V-statistic Sn- (All limits are taken as n —)■ cx) unless explicitly stated otherwise.) This 
V-statistic has a well known (Hoeffding-type) decomposition that we describe here because 
it is an essential feature of the common structure. For notational implicitly we will drop the 
Wi and Wj arguments and set = u2j{Wi, Wj) and -u"- = 

Letting || ■ || denote the Euclidean norm, and if E[||m”.||] < oo for all then 


Sn — Bn + tkn + Un, (2) 

where 

= E[5„], ^n= 

l< 2 <n 2<i<n 

V”W) = - E[<] + Y1 EKpr.], 

so that E[-^”(lFj)] = 0, E[Zi)”(lFj,..., hFi)|lFi_i,..., hFi] = 0, and = 0. This decom¬ 

position of a V-statistic is well known (e.g., van der Vaart (1998, Chapter 11)), and shows 
that Sn can be decomposed into a sum of independent terms, a U-statistic remainder Un 
that is a martingale difference sum and uncorrelated with and a pure bias term BnS The 
decomposition is important in many of the proofs of asymptotic normality of semiparametric 
estimators, including Powell, Stock, and Stoker (1989), with the limiting distribution being 
determined by and Un being treated as a “remainder” that is of smaller order under a 
particular restriction on the tuning parameter sequence (e.g., when the bandwidth shrinks 
slowly enough). 

An interesting feature of the decomposition (2) in semiparametric settings is that Un is 
asymptotically normal at some rate when the number of series terms grow or the bandwidth 
shrinks to zero. To be specific, under regularity conditions and appropriate tuning parameter 
sequences that we make precise below, it turns out that 


Y[Un]-S^Un 


-^d ■A/'(0, Ud)- 


In other settings, where the underlying kernel of the U-statistic does not vary with the 

^In time series contexts, the exact decomposition is less useful, but approximations thereof with properties 
similar to those we discuss herein can be developed. For an example and related references see Atchade and 
Cattaneo (2014). 
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sample size, the asymptotic behavior of Un is usually more complicated: because it is a de¬ 
generate U-statistic, it would converge to a weighted sum of independent chi-square random 
variables (e.g., van der Vaart (1998, Chapter 12)). However, in semiparametric-type settings 
as those considered here, the kernel of the underlying U-statistic forming Un changes with 
the sample size and hence, under particular tuning parameter conhgurations, the individual 
contributions D^{Wi ,..., Wi) to f/„ can be made small enough to satisfy a Lindeberg-Feller 
condition and thus obtain a Gaussian limiting distribution (usually employing the martin¬ 
gale property of t/„). For an interesting discussion of this phenomenon, see de Jong (1987). 
The asymptotic normality property of Un has been shown for certain classes of both series 
and kernel based estimators, as further explained below. 

Alternative asymptotics occurs when the number of series terms grows or the bandwidth 
shrinks fast enough so that and V[?7„] have the same magnitude in the limit. Because of 

uncorrelatedness of and Un, the asymptotic variance will be larger than the usual formula 
which is hm„_,.oo V[T„] (assuming the limit exists). As a consequence, consistent variance 
estimation under alternative asymptotics requires accounting for the contribution of Un to 
the (asymptotic) sampling variability of the statistic. Accounting for the presence of Un 
should also yield improvements when numbers of series terms and bandwidths do not satisfy 
the knife-edge conditions of alternative asymptotics, since Un is part of the semiparametric 
statistic. For instance, if the number of series terms grows just slightly slower than the 
sample size then accounting for the presence of Un should still give a better large sample 
approximation. Hansen, Hausman, and Newey (2008) show such an improvement for many 
instrument asymptotics. It would be good to consider such improved approximations more 
generally, though it is beyond the scope of this paper to do so. 

Distribution theory under alternative asymptotics may be seen as a generalization of the 
conventional large sample distributional approximation approach in the sense that under 
conventional sequences of tuning parameters the asymptotic variances emerging from both 
approaches coincide. But, the alternative asymptotic approximation also allows for other 
tuning parameter sequences and, in this case, the limiting asymptotic variance is seen to be 
larger than usual. Thus, in general, there is no reason to expect that the usual standard 
error formulas derived under conventional asymptotics will remain valid more generically. 
From this perspective, alternative asymptotics are useful to provide theoretical justihcation 
for new standard error formulas that are consistent under more general sequences of tuning 
parameters, that is, under both conventional and alternative asymptotics. We refer to the 
latter standard error formulas as being more robust than the usual standard error formulas 
available in the literature. For instance, using these ideas, the need for new, more robust 
standard errors formulas was made before for many instrument asymptotics in IV models 
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(Hansen, Hausman, and Newey (2008)) and small bandwidth asymptotics in kernel-based 
semiparametrics (Cattaneo, Crump, and Jansson (2014b)). 

To illustrate these ideas, we show next that both many instrument asymptotics and 
small bandwidth asymptotics have the structure described above, and we also employ this 
approach to derive new results in the case of a series estimator of the partially linear model, 
which we refer to as “many terms asymptotics”. 

Example 1: “Many Instrument Asymptotics” 

The hrst example is concerned with the case of many instrument asymptotics. For simplicity 
we focus on the JIVE2 estimator of Angrist, Imbens, and Krueger (1999), but the idea 
applies to other IV estimators such as the limited information maximum likelihood estimator. 
See Chao, Swanson, Hausman, Newey, and Woutersen (2012) for more details, including 
regularity conditions under which the following discussion can be made rigorous. 

Let {i/i, z[y, i = 1,..., n, be a random sample generated by the model 

Hi = x'/?o + ^ei\zi] = 0, (3) 

where Hi is a scalar dependent variable, Xi G is a vector of endogenous variables, e* is a 
disturbance, and Zi G is a vector of instrumental variables. 

To describe the JIVE2 estimator of (3o in (3), let Qij denote the (z,j)-th element of 
Q = Z{Z'Z)~^Z' , where Z = [zi, • • • , Zn]’ ■ After centering and scaling, the JIVE2 estimator 
jS satisfies 

/3o) ( ^ ^ QijXiXj) ( , — ^ ^ QijXiEj). 

Conditional on Z, (3 has the structure in (1) with Wi = (a;',^*)' and 

fn = - QijXixy u'lAWi,Wj) = A j)QijXi£j/\/n, 

where !(■) is the indicator function. 

For i A 3i E[M”j(lTi, Wj)\Z] = 0 and 

E[<^.(1F',, W,)\Wi, Z] = Q,jx,E[e,\Z] = 0, E[u]yWj, W,)|H^„ Z] = 

where Tj = E[xj| 2 ;i] can be interpreted as the reduced form for observation i. As a conse- 
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quence, (2) is satisfied with Bn = 0, 




Vi = Xi- Ti. 




Because Tj— Qij'^j is the i-th residual from regressing the reduced form observations 
on Z, by appropriate dehnition of the reduced form this can generally be assumed to vanish 
as the sample size grows. In that case, 



Furthermore, under standard asymptotics Qu will go to zero, so the limiting variance of the 
leading term in corresponds to the usual asymptotic variance for IV. The degenerate 
U-statistic term is 



Qij • 


Chao, Swanson, Hausman, Newey, and Woutersen (2012) apply a martingale central limit 
theorem to show that this f/„ will be asymptotically normal when K ^ oo and certain 
regularity conditions hold. The conditions of the martingale central limit theorem are verihed 


by showing that certain linear combinations with coefficients depending on the elements of 


Q go to zero as iF —)■ oo. In the proof, this makes individual terms asymptotically negligible, 
with a Lindeberg-Feller condition being satisfied. Alternative asymptotics occurs when K 
grows as fast as n, resulting in V[T„] and V[f/n] having the same magnitude in the limit. 

Example 2: “Small Bandwidth Asymptotics” 

The second example shows that small bandwidth asymptotics for certain kernel-based semi- 
parametric estimators also has the structure outlined above. To keep the exposition simple 
we focus on an estimator of the integrated squared density, but the structure of this estimator 
is shared by the density-weighted average derivative estimator of Powell, Stock, and Stoker 
(1989) treated in Cattaneo, Crump, and Jansson (2014b) and more generally by estimators 
of density-weighted averages and ratios thereof (see, e.g., Newey, Hsieh, and Robins (2004, 
Section 2) and references therein). 

Suppose Xj, i = 1,..., n, are i.i.d. continuously distributed p-dimensional random vectors 
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with smooth p.d.f. /o and consider estimation of the integrated squared density 


Po= [ foixfdx = E[fo{xi)]. 

Jrp 

A leave-one-out kernel-based estimator is 


/3 = ^ }Ch{xi- Xj)/n{n-l), 

where /C(m) is a symmetric kernel and IChiu) = h~PfC{u/h). This estimator has the V-statistic 
form of (1) with Wi = Xi and 


f„ = 1, W^) = ^ 3){K,h{x, - x^) - (3o}/V^{n - 1). 

Let fh{x) = /jjp /C(m)/o(x -h hu)<lu and fh{x)fo{x)<lx. By symmetry of /C(m), 


E[ul{Wi, W,)m = E[u],{Wj, = {Mx,) - /3o}/V^{n - 


1 ), 


Wj)] = {/ 3 , - - 1 ), 


so the terms in the decomposition (2) are of the form 


Bn = Vn{l 3 h - /^o}, ^ - Ph}, 

^ l<i<n 


2 

Un= ^ {^h{Xi-Xj)- fh{Xi)- fh{Xj)\ 


Here, 2{fh{xi) — /3h} is an approximation to the well known influence function 2{/o(xj) — 
/do} for estimators of the integrated squared density. Under regularity conditions, fh{xi) 
converges to fo{xi) in mean square as h —)■ 0, so that 


T = 

^ -n 


2{/o(xi) —/do} + Op(l). 


l<i<n 


A martingale central limit theorem can be applied as in Cattaneo, Crump, and Jansson 
(2014b) to show that the degenerate U-statistic term Un will be asymptotically normal as 
h —0 and n —>■ cx), provided that —)• oo. It is easy to show that —)■ 

A = /do }C{uydu (under regularity conditions). Alternative asymptotics occurs when 
shrinks as fast as 1/n, resulting in V[T„] and V[f/n] having the same magnitude in the limit. 
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Example 3: “Many Terms Asymptotics” 

The previous two examples show how several estimators share the common structure outlined 
above. To illustrate how this structure can be applied to derive new results, the third example 
studies series estimation in the context of the partially linear model. The results will shed 
light on the asymptotic behavior of this estimator, and the associated inference procedures, 
when the number of terms are allowed to grow as fast as the sample size. 

Let z'j)' , i = l,...,n, be a random sample of generated by the partially linear 

model 

Hi = x% + g{zi) + Ei, E[ei\xi,Zi\ = 0, (4) 

where Ui is a scalar dependent variable, Xi G and Zi G are explanatory variables, Si is 
a disturbance, g{-) is an unknown function, and E[V[a;i|; 2 j]] is of full rank. 

A series estimator of /Sq is obtained by regressing yi on Xi and approximating functions 
of Zi- To describe the estimator, let p^{z), ... be approximating functions, such as 

polynomials or splines, and let Pk{z) = (z),...,{z))' be a iL-dimensional vector of 

such functions. Letting Mij denote the (z, j)-th element of M = 1^ — Pk{PkPk)~^Pki where 
Pk = \pk{z\), ... ^pK^Zn)]', a series estimator of /do in (4) is given by 

^ ^ MijXiX^)-^{ ^ MijXiPj). 

Donald and Newey (1994) gave conditions for asymptotic normality of this estimator using 
standard asymptotics. See also for example Linton (1995), references therein, for related 
asymptotic results when using kernel estimators. 

Conditional on Z = [zi ,..., Zn]', (3 has the structure outlined earlier: 

- /do) = (5) 

with 

Tri ^ ^ MijXiXSji . — ^ ^ 

where gi = g{zi). In other words, (3 has the V-statistic form of (1) with Wi = {x^^Si)' and 

Wj) = XiMij{gj + ej)/^/n. 

By E[£j|xj, Zi] = 0 we have E[xi£j|Z] = 0. Therefore, letting = n” (ITj, Wj) as we have 
done previously, we have 


n, 


E[m" |Z] = hiMijgj/y/n, 


u'lj - E[m” |Z] = Mij {vigj + XiEj 


m”. = Mij {Vjgi + Vigj + XjSi + XiSj) /^/n, ¥\u^^\Wi, Z] = Mij {vigj + hjEi) / y/n, 

for i 7 ^ j, where hi = h{zi) = E[a:j| 2 :j] and Vi = Xi — hi. In this case, the bias term in (2) is 

Bn /— ^ ^ ^ijhigj 

which will be negligible under regularity conditions, as shown in the next section. Moreover, 

^ ^ MiiViEi -\- Rny Rn ^ ^ -^iji^igj “ 1 “ 

y/n y/n 

^ l<i<n ^ 


where Rn has mean zero and converges to zero in mean square as K grows, as further 
discussed below. Under standard asymptotics Mu will go to one and hence the limiting 
variance of the leading term in corresponds to the usual asymptotic variance. 

Finally, we hnd that the degenerate U-statistic term is 


Un = ^ 


-J. ^ Mij {ViEj + VjEi) = - J. ^ Qij {ViEj + VjEi) 

^ ^ 




Remarkably, this term is essentially the same as the degenerate U-statistic term for JIVE2 
that was discussed above. Consequently, the central limit theorem of Chao, Swanson, Haus- 
man, Newey, and Woutersen (2012) is applicable to this problem. We will employ it to show 
that Un is asymptotically normal as K ^ oo, even when K/n does not converge to zero. 

This example highlights a new approach to studying the asymptotic distribution of semi- 
linear regression under many terms asymptotics. The alternative asymptotic approximation 
is useful, for instance, when the number of covariates entering the nonparametric part is 
large relative to the sample size, as is often the case in empirical applications. 


3 Many Terms Asymptotics 

In this section we make precise the discussion given in Example 3, and also discuss consistent 
standard error estimation under homoskedasticity. The estimator /3 described in Example 
3 can be interpreted as a two-step semiparametric estimator with tuning parameter K, the 
hrst step involving series estimation of the the unknown (regression) functions g{z) and h{z). 
Donald and Newey (1994) gave conditions for asymptotic normality of this estimator when 
K/n —)• 0. Here we generalize their Endings by obtaining an asymptotic distributional result 
that is valid even when K/n \s bounded away from zero. 

The analysis proceeds under the following assumption. 
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Assumption PLM (Partially Linear Model) 

(a) (i/j, a;', z')', i = 1,..., n, is a random sample. 

(b) There is a (7 < cxd such that ¥\ej\xi^Zj\ < C and E[||ni||"^|^i] < C. 

(c) There is a C > 0 such that ¥.[el\xi,Zj\ > C and Amin(E[nin'|; 2 j]) > C. 

(d) rank(Pft') = K (a.s.) and there is a C > 0 such that Mu > C. 

(e) For some ag,ah > 0, there is a C < cxd such that 

min E[\g{zi) - r]'pK{zi)\‘^] < min E[\\h{zi) - r]hPK{zi)f] < 

Because = n — K, an implication of part (d) is that K/n < 1 — C < 1, 

but crucially Assumption PLM does not imply that K/n —)• 0. Part (e) is implied by 
conventional assumptions from approximation theory. For instance, when the support of 
Zi is compact commonly used basis of approximation, such as polynomials or splines, will 
satisfy this assumption with ag = Sg/dz and an = Sh/dz, where Sg and Sh denotes the 
number of continuous derivatives of g{z) and h{z), respectively. Further discussion and 
related references for several basis of approximation may be found in Newey (1997), Chen 
(2007) and Belloni, Chernozhukov, Chetverikov, and Kato (2015), among others. 

3.1 Asymptotic Distribution 

From equation (5), and the discussion in the previous section, we see that the asymptotic 
distribution of fd will be determined by the behavior of F„ and The following lemma 
approximates F„ without requiring that K/n ^ 0. 

Lemma 1 If Assumption PLM is satisfied and if K ^ oo, then 

F„ = F„ + Op(l), F„ = - ^ MiiE[viV-\zi\. 

l<i<n 

Because Mu = n — Kfiit follows from this result that in the homoskedastic Vi case 
(i.e., when E[njn'| 2 ;j] = E[njn']) F^ is close to 

F„ = (l-A/n)F, F = EM, 

in probability. More generally, with heteroskedasticity, F^ will be close to the weighted 
average F„. Importantly, this result includes standard asymptotics as a special case when 
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K/n —)■ 0, where ^ the law of large numbers and iterated expectations 

imply 


r 


n 


/fc X 

- ^ W(Viv[\zi] - ^(1 - Mii)E[viv[\zi] + Op(l) 

^ i=i ^ i=i 

1 

- ^ W(Viv[\zi] + Op(l) = r + Op(l). 


Next, we study 



yX + Bn + -Rn- 

1<2 j<n 


The following lemma quantihes the magnitude of the bias term Bn as well as the additional 
variability arising from the (remainder) term i?„. 


Lemma 2 If Assumption PLM is satisfied and if K ^ oo, then Bn = Op{^/nK “s “'*) and 

Rn Op ( 1 ) ■ 

Like the previous lemma, this lemma does not require K/n —)■ 0. Interestingly, the 
bias term Bn involves approximation of both unknown functions g{z) and h{z), implying 
an implicit trade-off between smoothness conditions for g{z) and h{z). The implied bias 
condition f ji qq only requires that ag + at be large enough, but not necessarily 

that ag and au separately be large. It follows that if this bias condition holds, then 

Sn 1= ^ ^ AlijViSj Op(l), 

'' l<i,j<n 

as claimed in Example 3 above. 

Having dispensed with asymptotically negligible contributions to Sn, we turn to its lead¬ 
ing term. This term is shown below to be asymptotically Gaussian with asymptotic variance 
given by 

MijViefiZ] = ^ 

l<i<n 

Here, the first term following the second equality corresponds to the usual asymptotic ap¬ 
proximation, while the second term adds an additional term that accounts for large K. Once 
again it is interesting to consider what happens in some special cases. Under homoskedas- 
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ticity of Si (i.e., when W[ej\xi,Zi\ = IE[£^]), 


— 

n 


M‘^jE[viv[\zi] = ^ MiiW(Viv[\zi] = a^Tn, 


cr. 


= E[e[ 


iJ’ 


l<2j<n 


l<2<n 


because YTj=i^ij — addition, E[nin'| 2 ;i] = then S„ = a‘^{l — K/n)T. 

Also, if K/n —>■ 0, then by YIki j<n ^ij!^ — K/n and the law of large numbers, we have 

Sn = - M‘f^[viv[el\zi] + Op (1) = ^[viv'^el] + Op (1), 

l<i<n 


which corresponds to the standard asymptotics limiting variance. 

The following theorem combines Lemmas 1 and 2 with a central limit theorem for 
quadratic forms to show asymptotic normality of jS. 

Theorem 1 If Assumption PLM is satisfied and if /n —)■ oo, then 


- /So) AS(o,/,), 


If, in addition, ¥\el\xi,Zi\ = a^, then 

This theorem shows that /3 is asymptotically normal when K/n need not converge to zero. 
An implication of this result is that inconsistent series-based nonparametric estimators of 
the unknown functions g{z) and h{z) may be employed when forming /3, that is, iL/n ^ 0 is 
allowed (increasing the variability of the nonparametric estimators), provided that K ^ oo 
(to remove nonparametric smoothing bias). This asymptotic distributional result does not 
rely on asymptotic linearity, nor on the actual convergence of the matrices T^ and and 
leads to a new (larger) asymptotic variance that captures terms that are assumed away 
by the classical result. The asymptotic distribution result of Donald and Newey (1994) is 
obtained as a special case where Kfri —)■ 0. More generally, when K/n does not converge to 
zero, the asymptotic variance will be larger than the usual formula because it accounts for 
the contribution of “remainder” f/„ in equation (2). For instance, when both e, and Vi are 
homoskedastic, the asymptotic variance is 

= alTp = <Tjr-‘(i - K/n)-', 

which is larger than the usual asymptotic variance cr^T”^ by the degrees of freedom correction 

(1 - K/n)-^. 
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3.2 Asymptotic Variance Estimation under Homoskedasticity 

Consistent asymptotic variance estimation is nsefnl for large sample inference. If the as- 
snmptions of Theorem 1 are satished and if —)-p 0, then 

nn = TAK\ 

implying that valid large-sample conhdence intervals and hypothesis tests for linear and 
nonlinear transformations of the parameter vector (3 can be based on Under (condi¬ 

tional) heteroskedasticity of nnknown form, constrncting a consistent estimator tnrns ont 
to be very challenging if K/n 0. Intnitively, the problem arises becanse the estimated 
residnals entering the constrnction of are not consistent nnless K/n —)■ 0, implying that 
Tin — Tn -^p 0 in general. Solving this problem is beyond the scope of this paper. Under 
homoskedasticity of Ei, however, the asymptotic variance simplihes and admits a corre¬ 
spondingly simple consistent estimator. To describe this result, note that if ¥\e‘l\xi,Zi\ = a/ 
then Tn = ct^T^, where r„ — T^ —)-p 0 by Lemma 1. It therefore suffices to hnd a consistent 
estimator of a/.. Let 

® ^ - d - X ^ ^ ^ ~ 

l<i<n 

denote the usual OLS estimator of a"/ incorporating a degrees of freedom correction. The 
following theorem shows that is a consistent estimator, even when the number of terms 
is “large” relative to the sample size. 

Theorem 2 Suppose the conditions of Theorem 1 are satisfied. //E[£^|xj, Zj] = a/, then 
—)-p a/ and S™ — —)-p 0, where S™ = 

This theorem provides a distribution free, large sample justihcation for the degrees-of- 
freedom correction required for exact inference under homoskedastic Gaussian errors. Intu¬ 
itively, accounting for the correct degrees of freedom is important whenever the number of 
terms in the semi-linear model is “large” relative to the sample size. 

^Another approach to inference would be via the bootstrap. For small bandwidth asymptotics, Cattaneo, 
Crump, and Jansson (2014a) showed that the standard nonparametric bootstrap does not provide a valid 
distributional approximation in general. We conjecture that the standard nonparametric bootstrap will also 
fail to provide valid inference for other alternative asymptotics frameworks. 
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4 Small Simulation Study 


We conducted a Monte Carlo experiment to explore the extent to which the asymptotic 
theoretical results obtained in the previous section are present in small samples. Using the 
notation already introduced, we consider the following partially linear model: 

Hi = x[l3 + g{zi) + Si, Zi] = 0, ^ej\xi, Zi] = cr^, 

Xi = h{zi) + Vi, '^[vi\zi\ = 0 , = al{zi), 

where d = 1, /? = 1, = 5, = (zu, ■ ■ ■ jZdj)' with za ~ i.i.d. Uniform(— 1,1), i = 

1, • • • ,dz. The unknown regression functions are set to g{zi) = h{zi) = exp(||zj|p), which 
are not additive separable in the covariates Zi. The simulation study is based on S' = 5, 000 
replications, each replication taking a random sample of size n = 500 with all random 
variables generated independently. We consider 6 data generating processes (DGPs) as 
follows: 

Data Generating Process for Monte Carlo Experiment 

{ei,Vi) - Distributions 
Gaussian Asymmetric Bimodal 
<j'^{zi) = 1 Model 1 Model 3 Model 5 

a‘^{zi) = <^(1 + Model 2 Model 4 Model 6 

Specihcally, Models 1, 3 and 5 correspond to homoskedastic (in Vi) DGPs, while Models 2, 4 
and 5 correspond to heteroskedastic (in Vi) DGPs. For the latter models, the constant was 
chosen so that IE[n?] = 1. The three distributions considered for the unobserved error terms Si 
and Vi are: the standard Normal (labelled “Gaussian”) and two Mixture of Normals inducing 
either an asymmetric or a bimodal distribution; their Lebesgue densities are depicted in 
Figure 1. We explored other specihcations for the regression functions, heteroskedasticity 
form, and distributional assumptions, but we do not report these additional results because 
they were qualitative similar to those discussed here. 

The estimators considered in the Monte Carlo experiment are constructed using power 
series approximations. We do not impose additive separability on the basis, though we do 
restrict the interaction terms to not exceed degree 5. To be specihc, we consider the following 
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polynomial basis expansion: 


Polynomial Basis Expansion: dz = 5 and n = 500 


K 

PK{Zi) 

Kjn 

6 

(1) Z\i, Z2i, Z'^i, Z^i, ^5i) 

0.012 

11 


0.022 

21 

Viiizi) + first-order interactions 

0.042 

26 

(P2i4)',4,4,4>4,4)' 

0.052 

56 

P 26 (^i) + second-order interactions 

0.112 

61 

(P56(^i)',4>4>4>4>4)' 

0.122 

126 

V&ii.Zi) + third-order interactions 

0.252 

131 

(pi26(^i)', 4, 4, 4o 4i, 4)' 

0.262 

252 

Pi 3 i(zi) + fourth-order interactions 

0.504 

257 

(P252(^i)', 4, 4, 4> 4, 4)' 

0.514 

262 

(P257(^i)', 4, 4i, 4> 4i, 4)' 

0.524 

267 

(P262(^i)',4>4>4>4>4)' 

0.534 

272 

(P267(^i)', 4,4,4> 4,4)' 

0.544 

277 

V ^10 ^10 ^10 ^10 ^lOV 
(P272(ZiJ , Zj^^ , 2^2* ) J 

0.554 


Tims, onr simnlations explore the conseqnences of introdncing many terms in the partially 
linear model by varying K on the grid above from K = 6 to K = 277, which gives a range 
for K/n of {0.012, ■ ■ ■ , 0.554}. For each point on the grid of K/n, we report average bias, 
average standard deviation, mean sqnare error and average standarized bias of jd across 
simnlations. We also consider the coverage error rates and interval length for two asymptotic 
95% conhdence intervals: 


CIo = 


/5 - $ 


-1 

l-o/2 







n 


CIi = 


/3-4> 


-1 

l—a 


/2 



+ *^l-a/2 





where = {n — d — K)s^/n, and = <h ^(n) denotes the inverse of the Gaussian distri¬ 
bution function. That is, CIq and CIi are formed employing the t-statistic constructed using 
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the homoskedasticity-consistent variance estimators without and with degrees of freedom 
correction, respectively. 

The main hndings from the Monte Carlo experiment are presented in Tables 1-3. All 
results are consistent with the theoretical conclusions presented in the previous section. 
First, the results for standard Normal and non-Normal errors are qualitatively similar. This 
indicates that the Gaussian approximation obtained in Theorem 1 is a good approximation in 
hnite samples, even when iF is a nontrivial fraction of the sample size. Second, as expected, 
a small choice of K leads to important smoothing biases. This affects the hnite sample 
properties of the point estimators as well as the distributional approximations obtained in 
this paper. In particular, it affects the empirical size of all the conhdence intervals. Third, 
in all cases the results under homoskedasticity or heteroskedasticity in Vi are qualitatively 
similar, showing that our theoretical results provide a good hnite sample approximation in 
both cases, even when iC is a nontrivial fraction of the sample size. Fourth, as suggested 
by Theorem 2, conhdence intervals without degrees of freedom correction (CIq) are under¬ 
sized, while the analogue conhdence intervals with degrees of freedom correction (CIi) have 
close-to-correct empirical size in all cases. This result shows that the degrees of freedom 
correction is crucial to achieve close-to-correct empirical size when K/n is non-negligible. 

In conclusion, we found in our small-scale simulation study that our theoretical results for 
the partially linear model with possibly many terms provide good approximation in samples 
of moderate size. In particular, under homoskedasticity of Si, we showed that conhdence 
intervals constructed using exhibit good empirical coverage even when K/n is “large”. We 
also conhrmed that the Gaussian distributional approximation given in Theorem 1 represents 
well the hnite sample distribution of /3 even when K/n is “large”. 

5 Conclusion 

This paper showed that the many instrument asymptotics and the small bandwidth asymp¬ 
totics shared a common structure based on a V-statistic, with a remainder term that is 
asymptotically normal when the number of term diverges to inhnity or the bandwidth shrinks 
to zero. This feature is particularly useful to obtain new results for other semiparametric 
estimators. In this paper we employ this common structure to derive a new alternative large- 
sample distributional approximation for a series estimator of the partially linear model, which 
implied a new (larger) asymptotic variance formula. 

Our results apply to a class of semiparametric estimators 13 satisfying 

^0-(3o) = t-^S^ + Op{l), 
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where and Sn take a particular V-stastistic form, as discussed in Section 2. This class 
of semiparametric estimators covers several interesting problems, but it is by no means 
exhaustive. For example, Cattaneo and Jansson (2015) show that a large class of (kernel- 
based) semiparametric estimators admit an expansion of the form 

where the bias term Bn is quantitatively and conceptually distinct from the smoothing bias 
Bn described in Section 2 and, crucially, dominates the quadratic term Un arising from the 
V-statistic S'„; that is, Un = Op{Bn) in that setting. Nevertheless, the structure we have 
considered in this paper is useful, providing new results for the partially linear model and a 
common structure for disparate literatures on many instruments and small bandwidths. 


6 Appendix: Proofs 

All statements involving conditional expectations are understood to hold almost surely. 
Qualihers such as “a.s.” will be omitted to conserve space. Throughout the appendix, C will 
denote a generic constant that may take different values in each case. 


Proof of Lemma 1. Let X = [xi, ... ,a:n]^ H = [hi, ..., h„]', and V = [fi, ... ,fn]'. By 
Assumption PLM and the Markov inequality, 

tT{-H'MH) = min - 'V] \\h{zi) - -^p 0. 

l<2<n 


Also, V'V/n = Op{l) by Assumption PLM and the Markov inequality, so by the Cauchy- 
Schwarz inequality and M idempotent, \\H'MV/n\\ < MH/n)ti{y'V/n)Y/‘^ —)-p 0. By 

the triangle inequality, we then have 

f„ = -X'MX = -(!/ + H)'MiV + H) = -V'MV + 0.(1). 
n n n 

Next, by Lemma Al of Chao, Swanson, Hausman, Newey, and Woutersen (2012), 


-V'MV 

n 


- ^ MiiViv'i + 

n 

\<i<n 


1 

n 


MijViv'j 


- Y MiiViv'i + Op{l). 
n 

l<i<n 
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Finally, by the Markov inequality and using E[n ^ ~ 

- ^ MiiViv[ - -)-p 0 

\<i<n 

because Assumption PLM implies that Viv[ and Uju' are uncorrelated conditional on Z and 
that |Z] <C. ■ 

Proof of Lemma 2. Let G = [^^i,..., and £ = [e:i,..., Sr^. By the Cauchy-Schwarz 
inequality, M idempotent. Assumption PLM, and the Markov inequality, 

\\-G'MH\\ < \ tii-G'MG)\ tii-H'MH) = 
n \ n \ n 

which gives Bn = G'MH/y/n = Op(-v/niF““s““'*). 

Also, Rn = {V'MG + H'Me)/^ = Op{K-°^ + = Op(l) because 

E[\\^V'MGf\Z] = -G'ME[VV'\Z]MG < G-G'MG = 
yn n n 

and 

E[\\^H'Mef\Z] = tii-H'ME[ee'\Z]MH) < Gtii-H'MH) = 
y/n n n 

by Assumption PLM and the Markov inequality. ■ 

Proof of Theorem 1. By Lemma A2 of Chao, Swanson, Hausman, Newey, and 
Woutersen (2012), 

MijViSj R) 

under Assumption PLM. Combining this result with Lemmas 1 and 2, we obtain the results 
stated in the theorem. ■ 

Proof of Theorem 2. Let Y = [yi, ...,?/„] and i = [fi,... ,d„]' = M{Y — xj3). It 
follows similarly to the proof of Lemma 1 that 

—s'Me = — 'y Miisl H— y SiMijSj 

l<2<n 

= - V Mi^[e1\zi] + Op (1) = - — —a^ + Op(l), 
n n 

l<i<n 

so it suffices to show that e'e/n = e'Me/n + Op(l). 
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Lemma 1 and 13 — 13 = Op(l) imply (/3 — 13)'X'MX{(3 — (3)/n = Op(l), which together 
with the Cauchy-Schwarz inequality and e'Me/n = Op(l) gives 

-(r - X/3 - G)'M{Y - X/3 - G) = -e'Me + -(/3 - /3yx'MX0 -/3) - -2e'MX0 - (3) 
n n n n 

= — E Ad £ Op ( 1 ) • 

n 

Similarly, G'AdG/n = Op (1) together with {Y — x{3 — G)'A/[{Y — X^ — G)/n = Op (1) and 
the Cauchy-Schwarz inequality gives 

-e'e = -(r - X(3)'MiY - X/3) = -(F - X/3 - G)'AdiY - X(3-G)+ Op(l). 
n n n 

The conclusion follows by the triangle inequality. ■ 
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Normal Distribution 







Table 1: Simulation Results, Models 1 — 2, Gaussian Distribution. 


(a) Model 1: Homoskedastic Vi 


K/n 

Bias 

SD 

RMSE 

Bias 

SD 

CIo 

CIi 

a 

s 

0.012 

0.481 

0.040 

0.483 

11.898 

0.000 

0.000 

0.039 

0.039 

0.022 

0.001 

0.045 

0.045 

0.031 

0.947 

0.950 

0.045 

0.045 

0.042 

0.002 

0.047 

0.047 

0.051 

0.939 

0.945 

0.045 

0.046 

0.052 

0.002 

0.046 

0.046 

0.049 

0.940 

0.947 

0.045 

0.046 

0.112 

0.002 

0.047 

0.047 

0.041 

0.936 

0.952 

0.045 

0.048 

0.122 

0.000 

0.048 

0.048 

0.005 

0.935 

0.949 

0.045 

0.048 

0.252 

0.001 

0.052 

0.052 

0.013 

0.907 

0.947 

0.045 

0.052 

0.262 

0.000 

0.052 

0.052 

-0.008 

0.904 

0.949 

0.045 

0.052 

0.504 

0.000 

0.063 

0.063 

0.003 

0.841 

0.951 

0.045 

0.064 

0.514 

0.000 

0.064 

0.064 

-0.002 

0.828 

0.947 

0.045 

0.064 

0.524 

0.000 

0.064 

0.064 

-0.003 

0.827 

0.948 

0.045 

0.065 

0.534 

0.000 

0.066 

0.066 

-0.003 

0.821 

0.950 

0.045 

0.066 

0.544 

0.001 

0.068 

0.068 

0.010 

0.803 

0.946 

0.045 

0.067 

0.554 

0.000 

0.067 

0.067 

0.004 

0.808 

0.949 

0.045 

0.067 




(b) Model 

2: Heteroskedastic 

Vi 



K/n 

Bias 

SD 

RMSE 

Bias 

SD 

CIo 

CIi 

a 

s 

0.012 

0.483 

0.046 

0.485 

10.460 

0.000 

0.000 

0.039 

0.040 

0.022 

0.002 

0.045 

0.045 

0.034 

0.949 

0.953 

0.045 

0.046 

0.042 

0.001 

0.046 

0.046 

0.015 

0.946 

0.949 

0.045 

0.046 

0.052 

0.002 

0.046 

0.046 

0.034 

0.947 

0.955 

0.045 

0.046 

0.112 

0.001 

0.049 

0.049 

0.015 

0.932 

0.950 

0.045 

0.048 

0.122 

0.001 

0.049 

0.049 

0.025 

0.929 

0.946 

0.045 

0.049 

0.252 

0.000 

0.052 

0.052 

0.009 

0.914 

0.951 

0.046 

0.053 

0.262 

0.001 

0.053 

0.053 

0.025 

0.915 

0.952 

0.046 

0.054 

0.504 

0.000 

0.068 

0.068 

0.002 

0.827 

0.947 

0.048 

0.068 

0.514 

0.001 

0.068 

0.068 

0.019 

0.829 

0.953 

0.048 

0.068 

0.524 

0.003 

0.068 

0.069 

0.050 

0.824 

0.953 

0.047 

0.069 

0.534 

0.000 

0.070 

0.070 

0.003 

0.819 

0.949 

0.048 

0.070 

0.544 

0.002 

0.070 

0.070 

0.024 

0.819 

0.948 

0.048 

0.071 

0.554 

0.000 

0.074 

0.074 

-0.004 

0.801 

0.943 

0.048 

0.072 


Notes: 

(i) columns Bias, SD, RMSE and ^ report, respectively, average bias, average standard deviation, root 
mean square error, and average standarized bias of the estimator /3 across simulations; 

(ii) columns CIq and CIi report empirical coverage for homoskedastic-consistent confidence intervals, 
respectively, without and with degrees of freedom correction; 

(iii) columns a and s report the average across simulations of the standard errors estimators, respectively, 
without and with degrees of freedom correction. 
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Table 2: Simulation Results, Models 3 — 4, Asymmetric Distribution. 


(a) Model 3: Homoskedastic Vi 


Kjn 

Bias 

SD 

RMSE 

Bias 

SD 

CIo 

CIi 

a 

s 

0.012 

0.481 

0.039 

0.483 

12.486 

0.000 

0.000 

0.038 

0.038 

0.022 

0.002 

0.043 

0.043 

0.040 

0.943 

0.946 

0.042 

0.042 

0.042 

0.001 

0.044 

0.044 

0.032 

0.942 

0.947 

0.042 

0.043 

0.052 

0.001 

0.043 

0.043 

0.023 

0.946 

0.954 

0.042 

0.043 

0.112 

0.001 

0.045 

0.045 

0.023 

0.931 

0.947 

0.042 

0.044 

0.122 

0.002 

0.045 

0.045 

0.036 

0.936 

0.951 

0.042 

0.045 

0.252 

0.001 

0.049 

0.049 

0.013 

0.902 

0.950 

0.042 

0.048 

0.262 

0.001 

0.049 

0.049 

0.013 

0.915 

0.953 

0.042 

0.049 

0.504 

0.000 

0.060 

0.060 

0.001 

0.829 

0.950 

0.042 

0.059 

0.514 

0.000 

0.060 

0.060 • 

-0.007 

0.828 

0.948 

0.042 

0.060 

0.524 

0.000 

0.060 

0.060 ■ 

-0.006 

0.830 

0.952 

0.042 

0.061 

0.534 

0.000 

0.061 

0.061 ■ 

-0.001 

0.819 

0.950 

0.042 

0.061 

0.544 

0.000 

0.062 

0.062 

0.000 

0.809 

0.951 

0.042 

0.062 

0.554 

0.001 

0.064 

0.064 

0.009 

0.794 

0.944 

0.042 

0.063 



(b) Model 4: 

: Heteroskedastic v 

'i 



Kjn 

Bias 

SD 

RMSE 

Bias 

SD 

CIo 

CIi 

a 

s 

0.012 

0.485 

0.046 

0.488 

10.566 

0.000 

0.000 

0.038 

0.038 

0.022 

0.001 

0.042 

0.042 

0.031 

0.947 

0.949 

0.042 

0.043 

0.042 

0.001 

0.043 

0.043 

0.025 

0.946 

0.951 

0.042 

0.043 

0.052 

0.002 

0.044 

0.044 

0.047 

0.937 

0.943 

0.042 

0.043 

0.112 

0.002 

0.045 

0.045 

0.037 

0.933 

0.945 

0.043 

0.045 

0.122 

0.001 

0.046 

0.046 

0.025 

0.929 

0.945 

0.043 

0.046 

0.252 

0.000 

0.050 

0.050 

-0.004 

0.910 

0.949 

0.043 

0.050 

0.262 

0.001 

0.050 

0.050 

0.020 

0.907 

0.951 

0.043 

0.050 

0.504 

0.000 

0.064 

0.064 

-0.002 

0.832 

0.947 

0.045 

0.064 

0.514 

0.001 

0.065 

0.065 

0.008 

0.827 

0.948 

0.045 

0.064 

0.524 

-0.001 

0.065 

0.065 

-0.015 

0.817 

0.948 

0.045 

0.065 

0.534 

0.001 

0.066 

0.066 

0.013 

0.824 

0.948 

0.045 

0.065 

0.544 

0.000 

0.067 

0.067 

-0.002 

0.799 

0.951 

0.045 

0.066 

0.554 

0.000 

0.067 

0.067 

-0.001 

0.811 

0.948 

0.045 

0.067 


Notes: 

(i) columns Bias, SD, RMSE and ^ report, respectively, average bias, average standard deviation, root 
mean square error, and average standarized bias of the estimator /3 across simulations; 

(ii) columns CIq and CIi report empirical coverage for homoskedastic-consistent confidence intervals, 
respectively, without and with degrees of freedom correction; 

(iii) columns a and s report the average across simulations of the standard errors estimators, respectively, 
without and with degrees of freedom correction. 
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Table 3: Simulation Results, Models 5 — 6, Bimodal Distribution. 


(a) Model 5: Homoskedastic Vi 


K/n 

Bias 

SD 

RMSE 

Bias 

SD 

CIo 

CIi 

a 

s 

0.012 

0.482 

0.058 

0.486 

8.340 

0.000 

0.000 

0.059 

0.059 

0.022 

0.001 

0.076 

0.076 

0.009 

0.948 

0.950 

0.076 

0.077 

0.042 

0.001 

0.078 

0.078 

0.008 

0.944 

0.948 

0.076 

0.077 

0.052 

- 0.001 

0.078 

0.078 

- 0.010 

0.940 

0.948 

0.076 

0.078 

0.112 

0.002 

0.081 

0.081 

0.026 

0.930 

0.946 

0.076 

0.080 

0.122 

0.001 

0.080 

0.080 

0.018 

0.936 

0.953 

0.076 

0.081 

0.252 

0.002 

0.088 

0.088 

0.026 

0.912 

0.949 

0.076 

0.088 

0.262 

0.001 

0.087 

0.087 

0.008 

0.908 

0.952 

0.076 

0.088 

0.504 

- 0.001 

0.109 

0.109 

- 0.013 

0.827 

0.950 

0.076 

0.108 

0.514 

0.001 

0.108 

0.108 

0.012 

0.832 

0.953 

0.076 

0.109 

0.524 

0.000 

0.110 

0.110 

0.003 

0.825 

0.948 

0.076 

0.110 

0.534 

- 0.004 

0.110 

0.110 

- 0.033 

0.818 

0.950 

0.076 

0.111 

0.544 

0.001 

0.111 

0.111 

0.012 

0.819 

0.949 

0.076 

0.112 

0.554 

- 0.001 

0.111 

0.111 

- 0.006 

0.817 

0.956 

0.076 

0.114 



(b) Model 6: 

Heteroskedastic Vi 




Kjn 

Bias 

SD 

RMSE 

Bias 

SD 

CIo 

CIi 

a 

s 

0.012 

0.483 

0.062 

0.487 

7.811 

0.000 

0.000 

0.059 

0.060 

0.022 

0.001 

0.077 

0.077 

0.011 

0.945 

0.948 

0.076 

0.077 

0.042 

0.001 

0.077 

0.077 

0.011 

0.945 

0.951 

0.076 

0.078 

0.052 

- 0.001 

0.079 

0.079 

- 0.009 

0.941 

0.948 

0.077 

0.079 

0.112 

0.000 

0.082 

0.082 

0.001 

0.938 

0.954 

0.077 

0.082 

0.122 

0.004 

0.080 

0.080 

0.046 

0.942 

0.955 

0.077 

0.082 

0.252 

0.000 

0.092 

0.092 

0.002 

0.904 

0.946 

0.078 

0.090 

0.262 

0.002 

0.089 

0.089 

0.026 

0.910 

0.957 

0.078 

0.091 

0.504 

- 0.001 

0.117 

0.117 

- 0.005 

0.826 

0.946 

0.080 

0.114 

0.514 

- 0.002 

0.116 

0.116 

- 0.017 

0.828 

0.951 

0.081 

0.116 

0.524 

0.000 

0.118 

0.118 

0.003 

0.821 

0.945 

0.081 

0.117 

0.534 

0.001 

0.118 

0.118 

0.010 

0.815 

0.953 

0.081 

0.119 

0.544 

0.000 

0.119 

0.119 

- 0.003 

0.816 

0.952 

0.081 

0.120 

0.554 

0.000 

0.125 

0.125 

0.001 

0.797 

0.943 

0.081 

0.121 


Notes: 

(i) columns Bias, SD, RMSE and ^ report, respectively, average bias, average standard deviation, root 
mean square error, and average standarized bias of the estimator /3 across simulations; 

(ii) columns CIq and CIi report empirical coverage for homoskedastic-consistent confidence intervals, 
respectively, without and with degrees of freedom correction; 

(iii) columns a and s report the average across simulations of the standard errors estimators, respectively, 
without and with degrees of freedom correction. 
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